Add green context support#1976
Conversation
Restructure tests into fixtures + classes with full resource cleanup: - Fixtures: sm_resource, wq_resource, green_ctx (with CUDAError skip), green_ctx_active (with try/finally restore), fill_kernel - _use_green_ctx context manager for safe push/pop in all tests - TestSMResourceQuery: properties, arch constraints per CC - TestSMResourceSplit: single/two-group splits, discovery, alignment, dry-run vs real parity - TestGreenContextKernelLaunch: compile + launch + verify in green ctx, two independent green contexts, SM + workqueue combined All set_current calls are paired with restore in finally blocks to prevent context stack leaks on test failure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ok to test ac5c0fc |
This comment has been minimized.
This comment has been minimized.
17c2be2 to
08d52d1
Compare
- Convert ContextOptions and SMResourceOptions/WorkqueueResourceOptions to cdef dataclasses for check_or_create_options compatibility. - Cache SM metadata in typed cdef fields; fall back to arch-based granularity on CUDA 12.x where CUdevSmResource lacks minSmPartitionSize/smCoscheduledAlignment. - Simplify Context to hold only ContextHandle (remove _h_green_ctx and _is_green fields). Green ctx association lives in ContextBox; is_green queries get_context_green_ctx() on demand. - ContextOptions.resources accepts Sequence only (no bare resource). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
08d52d1 to
3013fe8
Compare
Switch from the push model (dev.set_current + dev.create_stream) to the explicit model (ctx.create_stream + ctx.resources) as the primary way to use green contexts. Context.create_stream(options): - Only supported on green contexts (raises on primary contexts). - Delegates to Stream._init, which calls create_stream_handle in C++. - C++ create_stream_handle auto-dispatches: checks get_context_green_ctx and calls cuGreenCtxStreamCreate for green contexts, or cuStreamCreateWithPriority for primary. Single function, no duplication. Context.resources: - Returns a DeviceResources namespace querying this context's resources (cuCtxGetDevResource / cuGreenCtxGetDevResource), not the full device. dev.set_current(green_ctx) still works but is not the recommended path. Tests rewritten to use the explicit model throughout. Push-model set_current kept as regression tests with _use_green_ctx helper. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
62e4883 to
3287204
Compare
- Let the driver validate the nonblocking flag for green context streams: cuGreenCtxStreamCreate rejects CU_STREAM_DEFAULT. On failure, check if the context is green + nonblocking is False and raise a clear ValueError. - cuCtxGetStreamPriorityRange failure (CUDA_ERROR_INVALID_CONTEXT) now raises: "No current CUDA context. Call dev.set_current() before creating streams." - C++ create_stream_handle returns CUDA_ERROR_NOT_SUPPORTED if the context is green but cuGreenCtxStreamCreate is unavailable (CUDA < 12.5), instead of falling through to cuStreamCreateWithPriority. - ctx.resources.workqueue now dispatches to cuGreenCtxGetDevResource for green contexts, matching the SM query path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
eebd4cf to
5989fd1
Compare
- dev.create_context raises ValueError (not NotImplementedError) when options or resources are missing. - Cache version checks (_check_green_ctx_support, _check_workqueue_support) at module level; raise ValueError instead of NotImplementedError. - Simplify _device_resources.pyx: merge _as_uint and _count_to_sm_count into _to_sm_count; inline unsigned int casts for coscheduled params. - Add green context classes to api.rst (Context, ContextOptions, DeviceResources, SMResource, SMResourceOptions, WorkqueueResource, WorkqueueResourceOptions). - Update all docstrings to NumPy style with Attributes/Parameters/Returns sections matching the existing codebase convention. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5989fd1 to
fa254a5
Compare
|
/ok to test fa254a5 |
Andy-Jost
left a comment
There was a problem hiding this comment.
I see a few issues with the registries. It appears the context registry needs to be checked in one place.
I have a bigger concern with the stream registry. I don't see what problem it solves, and it appears to be corruptible through the user API.
| """True if this context was created from device resources.""" | ||
| if not self._h_context: | ||
| return False | ||
| return get_context_green_ctx(self._h_context).get() != NULL |
There was a problem hiding this comment.
nit: consider putting this in the handle API as is_green(self._h_context).
…td::vector Review comment 1: Consolidate create_context_handle_from_green_ctx with create_context_handle_ref by adding a private overload that takes an optional GreenCtxHandle. The green ctx path now delegates to it after calling cuCtxFromGreenCtx, ensuring registry lookup and deduplication. Review comments 2-4: Move GILReleaseGuard to the first line in create_green_ctx_handle and create_context_handle_from_green_ctx for consistency with the rest of the file. Review comment 6: Keep is_green check inline in _context.pyx using get_context_green_ctx (cannot add a C++ is_green function across separate .so boundaries without linker issues). Review comment 8: Replace malloc/free with std::vector<CUdevResource> in Device.create_context for automatic cleanup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ok to test 41cf1de |
Owner-backed stream handles (from create_stream_handle_with_owner) are no longer registered in the stream_registry. Multiple Python owners can wrap the same CUstream independently, each stacking its own Py_INCREF/Py_DECREF without competing for a single registry slot. The registry lookup at the top is preserved to reuse existing cuda-core-owned handles that carry context metadata. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ok to test 9d21b6b |
@leofang, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/ |
|
/ok to test 4d82262 |
|
/ok to test d840bdc |
|
I am very impatient. My CI run was finally green after several retries, but someone cut my line and caused merge conflicts. I am going to admin-merge this PR, and leave any test failures to the next person to deal with. |
|
Close #1563. Close #112.
Summary
Add green context support to cuda.core — the explicit-model API for querying device resources, splitting SMs, creating green contexts, and using them without touching the thread-local context stack.
Design
See the companion design doc for full rationale. Key decisions:
Contexttype — no user-visibleGreenContextsubclass. A singleContextwraps either a primaryCUcontextor aCUgreenCtx+ derivedCUcontext.ctx.is_greendistinguishes them. Inspired by the CUDA runtime's execution-context (EC) abstraction.dev.resourcesnamespace —DeviceResourcesgroups hardware resource queries (dev.resources.sm,dev.resources.workqueue). Follows the existing "plural = namespace" pattern (dev.properties,kernel.attributes).ctx.resources/stream.resources— sameDeviceResourcestype, but queries the context's provisioned resources (cuCtxGetDevResource/cuGreenCtxGetDevResource) instead of the full device.SMResourceOptionswith SoA broadcasting — single dataclass forSMResource.split(). Scalar fields broadcast;countdrives the group count.count=Nonemeans discovery mode (translated tosmCount=0internally).WorkqueueResourcemergesCU_DEV_RESOURCE_TYPE_WORKQUEUE_CONFIGandCU_DEV_RESOURCE_TYPE_WORKQUEUEunder one user-facing class. Strings for option values (e.g.sharing_scope="green_ctx_balanced").ContextOptions(resources=[...])→dev.create_context()— resource descriptor generation andcuGreenCtxCreateare internal. The user passes pre-split resource objects.ctx.create_stream()creates streams bound to a green context without callingdev.set_current(). The C++ handle layer auto-dispatches betweencuGreenCtxStreamCreateandcuStreamCreateWithPrioritybased on the context type. Green context streams must be non-blocking.ctx.close()does not manage the context stack — closing a current context raisesRuntimeError.dev.set_current(green_ctx)still works for backward compatibility but is not the recommended path.New public API
Device.resources→DeviceResources(namespace:.sm,.workqueue)Context.resources→DeviceResources(context-level query of provisioned resources)Stream.resources→DeviceResources(delegates to the stream's parent context)Context.create_stream(options)→Stream(green contexts only; raises on primary)Context.is_green→boolSMResource— properties:sm_count,min_partition_size,coscheduled_alignment,flags,handle; method:split(options, *, dry_run=False)SMResourceOptions—count,coscheduled_sm_count,preferred_coscheduled_sm_countWorkqueueResource— method:configure(options)WorkqueueResourceOptions—sharing_scopeContextOptions.resources— acceptsSequence[SMResource | WorkqueueResource]Implementation details
C++ handle layer (
resource_handles.hpp/cpp):GreenCtxHandle(shared_ptr<const CUgreenCtx>) — owning handle; destructor callscuGreenCtxDestroy.ContextBoxgains aGreenCtxHandlefield so the derivedCUcontextkeeps the green ctx alive.get_context_green_ctx()provides reverse lookup.create_green_ctx_handle()combinescuDevResourceGenerateDesc+cuGreenCtxCreatein one call — the descriptor is transient (noDevResourceDescHandleneeded since CUDA has no explicit destroy for it).create_stream_handle()auto-dispatches: checksget_context_green_ctx()on the providedContextHandleand callscuGreenCtxStreamCreatefor green contexts,cuStreamCreateWithPriorityfor primary. ReturnsCUDA_ERROR_NOT_SUPPORTEDif the context is green butcuGreenCtxStreamCreateis unavailable (CUDA < 12.5).context_registry/stream_registry(HandleRegistry) deduplicate handles by raw CUDA pointer, enabling identity-preservingset_currentswaps.Bug fix — stream context tracking:
StreamBoxnow carries aContextHandledependency, populated at creation time.get_stream_context()returns it without a driver call.Stream._from_handleandStream_ensure_ctxprefer the registry-backed handle before falling back tocuStreamGetCtx. This fixes a latent issue where streams created in a green context would lose their context association after aset_currentswap.Error handling:
dev.create_context()without resources raisesValueErrorwith a clear message.nonblocking=Falseis caught by the driver (CUDA_ERROR_INVALID_VALUE) and re-raised asValueErrorwith a helpful message.cuCtxGetStreamPriorityRangefailure (CUDA_ERROR_INVALID_CONTEXT) raises "Call dev.set_current() before creating streams."Version guards:
IF CUDA_CORE_BUILD_MAJOR >= 13gatescuDevSmResourceSplit(the general/structured form).cy_driver_version() >= (12, 4, 0)for all green ctx APIs;>= (13, 1, 0)for structured splits. RaisesValueErrorwhen unsupported.cuDevSmResourceSplitByCountfor basic (homogeneous) splits. Per-groupcoscheduled_sm_countand heterogeneous counts require 13.1+ and raiseNotImplementedErroron 12.x._get_optional_driver_fn— gracefulNULLwhen bindings lack the symbol.Test coverage
33 tests in
test_green_context.py, organized with proper pytest fixtures and classes:sm_resource,wq_resource,green_ctx(withCUDAError→ skip),fill_kernel_use_green_ctxcontext manager for safe push/pop in set_current regression testsTestSMResourceQuery— properties, arch constraints (pre-Hopper vs Hopper+)TestWorkqueueResource— query, configure valid/invalidTestSMResourceSplitValidation— scalar/Sequence mismatch, negative count, dry-run blockedTestSMResourceSplit— single/two-group splits with arch-aligned counts, discovery mode, alignment, dry-run parityTestGreenContextLifecycle—is_green,create_streamon primary raises, blocking stream raises, explicit stream creation, stream/event context tracking, close-while-current guard, set_current regressionTestContextResources— green ctx SM resources are subset of device, two contexts have disjoint partitions, stream.resources matches ctx.resources (SM + workqueue)TestGreenContextKernelLaunch— compile + launch + host-verify viactx.create_stream(), two independent green contexts with different fill values, SM + workqueue combinedValidation
-- Leo's bot